** Data reading and variable selection from raw data
** Growth of American Families 1955

*** Note that sample is only composed of females

** 01. Reading data **

cap log close
clear all
set more off
cd /*insert you work directory here*/
use /*read your data here*/ 


** 02. Consructing year and country variables **

ge year=1955
lab var year "survey year"

ge country=840
lab var country "ISO country code"
//US:840 (see "ISO Country Codes.pdf) 


** 03. ID variables **

ge pid=INT_NUM
lab var pid "person id"

** 04. Basic Demographics (Sex and Age/birth year) **

//sample is only composed of married women, therefore there is no gender variable. We create one anyways though.

ge sex = 2 if pid > 0 & pid < .
lab var sex "sex"
lab def sex 1 "male" 2 "female"
lab val sex sex

ge age=AGE
lab var age "age"

ge birthyr = year - age

** 05. Siblings **
//note that sibling category of "2 siblings" is actually listed as "2 or N.A." in codebook

ge nsibs=PAR_SIB
lab var nsibs "number of siblings"


** 06. Own education **

rename ED educ_cat

ge educ_yrs = .
replace educ_yrs = 4 if educ_cat == 1
replace educ_yrs = 6 if educ_cat == 2
replace educ_yrs = 7 if educ_cat == 3
replace educ_yrs = 8 if educ_cat == 4
replace educ_yrs = 10 if educ_cat == 5
replace educ_yrs = 12 if educ_cat == 6
replace educ_yrs = 14 if educ_cat == 7
replace educ_yrs = 16 if educ_cat == 8
replace educ_yrs = 18 if educ_cat == 9


//label respondent education

lab var educ_cat "highest level of education completed"


** 07. Parents' education: Father and/or Mother **

//parental education not available


** 08. Own occupation **

rename EMP_OCCAM occ_short

rename EMP_OCCB4MAR occ_short_b4marriage

lab var occ_short "Last or present occupation after marriage, aggregated codes"

lab var occ_short_b4marriage "Last occupation before marriage, aggregated codes"

// missing
recode occ_short_b4marriage occ_short (99=.) (0=.)


** 09. Parents' occupation **

rename PAR_OCC_F faocc


** 10. Tabulate the Identified Variables **

log using /*insert you work directory here*/, replace text

** Data reading and variable selection from raw data
** Growth of American Families 1955

** Sex **
tab sex

** Age, Birth Year **
sum age birthyr, d

** Siblings **
sum nsibs, d

** R's Own Education **
tab1 educ_cat

** R's Own Occupation **
tab1 occ_short occ_short_b4marriage

** Parent's Occupation **

tab1 faocc

log close

** 11. Keep the identified variables only

keep year country pid sex age birthyr ///
	 nsibs ///
	 educ_cat educ_yrs ///
	 occ_short occ_short_b4marriage ///
	 faocc


** 12. Save the Data File **

saveold /*insert you work directory here*/, replace


** 13. Create ISCED Education Variable**

ge educ_ISCED = .
replace educ_ISCED = 000 if educ_cat == 1
replace educ_ISCED = 100 if educ_cat == 2
replace educ_ISCED = 100 if educ_cat == 3
replace educ_ISCED = 100 if educ_cat == 4
replace educ_ISCED = 200 if educ_cat == 5
replace educ_ISCED = 300 if educ_cat == 6
replace educ_ISCED = 500 if educ_cat == 7
replace educ_ISCED = 600 if educ_cat == 8
replace educ_ISCED = 700 if educ_cat == 9

** 14. Save the Data File **

saveold /*insert you work directory here*/, replace
